Cross-Lingual Taxonomy Alignment with Bilingual Biterm Topic Model

نویسندگان

  • Tianxing Wu
  • Guilin Qi
  • Haofen Wang
  • Kang Xu
  • Xuan Cui
چکیده

As more and more multilingual knowledge becomes available on the Web, knowledge sharing across languages has become an important task to benefit many applications. One of the most crucial kinds of knowledge on the Web is taxonomy, which is used to organize and classify the Web data. To facilitate knowledge sharing across languages, we need to deal with the problem of cross-lingual taxonomy alignment, which discovers the most relevant category in the target taxonomy of one language for each category in the source taxonomy of another language. Current approaches for aligning crosslingual taxonomies strongly rely on domain-specific information and the features based on string similarities. In this paper, we present a new approach to deal with the problem of cross-lingual taxonomy alignment without using any domain-specific information. We first identify the candidate matched categories in the target taxonomy for each category in the source taxonomy using the crosslingual string similarity. We then propose a novel bilingual topic model, called Bilingual Biterm Topic Model (BiBTM), to perform exact matching. BiBTM is trained by the textual contexts extracted from the Web. We conduct experiments on two kinds of real world datasets. The experimental results show that our approach significantly outperforms the designed state-of-the-art comparison methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Encoding Category Correlations into Bilingual Topic Modeling for Cross-Lingual Taxonomy Alignment

Cross-lingual taxonomy alignment (CLTA) refers to mapping each category in the source taxonomy of one language onto a ranked list of most relevant categories in the target taxonomy of another language. Recently, vector similarities depending on bilingual topic models have achieved the state-of-the-art performance on CLTA. However, these models only model the textual context of categories, but i...

متن کامل

Bilingual Segmented Topic Model

This study proposes the bilingual segmented topic model (BiSTM), which hierarchically models documents by treating each document as a set of segments, e.g., sections. While previous bilingual topic models, such as bilingual latent Dirichlet allocation (BiLDA) (Mimno et al., 2009; Ni et al., 2009), consider only cross-lingual alignments between entire documents, the proposed model considers cros...

متن کامل

Cross - lingual Information Retrieval Model based on Bilingual Topic Correlation ⋆

How to construct relationship between bilingual texts is important to effectively processing multi-lingual text data and cross language barriers. Cross-lingual latent semantic indexing (CL-LSI) corpus-based doesnot fully take into account bilingual semantic relationship. The paper proposes a new model building semantic relationship of bilingual parallel document via partial least squares (PLS)....

متن کامل

Cross-lingual Pseudo Relevance Feedback Based on Weak Relevant Topic Alignment

In this paper, a cross-lingual pseudo relevance feedback (PRF) model based on weak relevant topic alignment (WRTA) is proposed for cross language query expansion on unparallel web pages. Topics in different languages are aligned on the basis of translation. Useful expansion terms are extracted from weak relevant topics according to the bilingual term similarity. Experiment results on web-derive...

متن کامل

Cross Lingual Entity Linking with Bilingual Topic Model

Cross lingual entity linking means linking an entity mention in a background source document in one language with the corresponding real world entity in a knowledge base written in the other language. The key problem is to measure the similarity score between the context of the entity mention and the document of the candidate entity. This paper presents a general framework for doing cross lingu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016